Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 442 | 437 |
| Missing cells (%) | 8.3% | 8.2% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Age has 93 (20.9%) missing values | Age has 85 (19.1%) missing values | Missing |
Cabin has 348 (78.0%) missing values | Cabin has 351 (78.7%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 310 (69.5%) zeros | SibSp has 301 (67.5%) zeros | Zeros |
Parch has 340 (76.2%) zeros | Parch has 335 (75.1%) zeros | Zeros |
Fare has 6 (1.3%) zeros | Fare has 8 (1.8%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-09-23 16:02:45.965654 | 2025-09-23 16:02:48.141995 |
| Analysis finished | 2025-09-23 16:02:48.139123 | 2025-09-23 16:02:50.267723 |
| Duration | 2.17 seconds | 2.13 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 443.85874 | 467.34978 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 3 |
| Maximum | 891 | 890 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 3 |
| 5-th percentile | 50.5 | 51.25 |
| Q1 | 211.25 | 250.25 |
| median | 447.5 | 481.5 |
| Q3 | 668.5 | 683.5 |
| 95-th percentile | 851.5 | 845.75 |
| Maximum | 891 | 890 |
| Range | 890 | 887 |
| Interquartile range (IQR) | 457.25 | 433.25 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 258.60845 | 254.54825 |
| Coefficient of variation (CV) | 0.58263684 | 0.54466326 |
| Kurtosis | -1.2003823 | -1.1902288 |
| Mean | 443.85874 | 467.34978 |
| Median Absolute Deviation (MAD) | 228 | 216 |
| Skewness | 0.034350528 | -0.10317086 |
| Sum | 197961 | 208438 |
| Variance | 66878.333 | 64794.812 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 124 | 1 | 0.2% |
| 477 | 1 | 0.2% |
| 569 | 1 | 0.2% |
| 642 | 1 | 0.2% |
| 523 | 1 | 0.2% |
| 546 | 1 | 0.2% |
| 259 | 1 | 0.2% |
| 111 | 1 | 0.2% |
| 530 | 1 | 0.2% |
| 99 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 499 | 1 | 0.2% |
| 792 | 1 | 0.2% |
| 192 | 1 | 0.2% |
| 157 | 1 | 0.2% |
| 582 | 1 | 0.2% |
| 344 | 1 | 0.2% |
| 30 | 1 | 0.2% |
| 887 | 1 | 0.2% |
| 748 | 1 | 0.2% |
| 99 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 7 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 19 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 22 | 1 | |
| 23 | 1 | |
| 24 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 22 | 1 | |
| 23 | 1 | |
| 24 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 4 | 1 | |
| 7 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 19 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 0 |
| 2nd row | 0 | 0 |
| 3rd row | 1 | 1 |
| 4th row | 0 | 1 |
| 5th row | 0 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 273 | |
| 1 | 173 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 2 | |
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 2 | 2 |
| 2nd row | 3 | 2 |
| 3rd row | 1 | 3 |
| 4th row | 3 | 1 |
| 5th row | 1 | 2 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 105 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 2 | 101 | |
| 1 | 101 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 105 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 2 | 101 | |
| 1 | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 105 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 2 | 101 | |
| 1 | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 105 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 2 | 101 | |
| 1 | 101 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 105 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 2 | 101 | |
| 1 | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 252 | |
| 1 | 105 | |
| 2 | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 3 | 244 | |
| 2 | 101 | |
| 1 | 101 |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 67 |
| Median length | 49 | 48 |
| Mean length | 26.798206 | 27.109865 |
| Min length | 13 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Renouf, Mr. Peter Henry | Gaskell, Mr. Alfred |
| 2nd row | Doharr, Mr. Tannous | Carbines, Mr. William |
| 3rd row | Sagesser, Mlle. Emma | Gilnagh, Miss. Katherine "Katie" |
| 4th row | Lahoud, Mr. Sarkis | Thayer, Mrs. John Borland (Marian Longstreth Morris) |
| 5th row | Nicholson, Mr. Arthur Ernest | Sedgwick, Mr. Charles Frederick Waddington |
| Value | Count | Frequency (%) |
| mr | 258 | 14.3% |
| miss | 91 | 5.0% |
| mrs | 68 | 3.8% |
| william | 37 | 2.1% |
| john | 24 | 1.3% |
| henry | 20 | 1.1% |
| master | 18 | 1.0% |
| thomas | 12 | 0.7% |
| george | 12 | 0.7% |
| mary | 11 | 0.6% |
| Other values (891) | 1251 |
| Value | Count | Frequency (%) |
| mr | 261 | 14.4% |
| miss | 88 | 4.8% |
| mrs | 64 | 3.5% |
| william | 33 | 1.8% |
| master | 25 | 1.4% |
| john | 18 | 1.0% |
| charles | 15 | 0.8% |
| thomas | 15 | 0.8% |
| henry | 14 | 0.8% |
| george | 11 | 0.6% |
| Other values (913) | 1272 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1357 | 11.4% | |
| r | 957 | 8.0% |
| e | 848 | 7.1% |
| a | 826 | 6.9% |
| n | 659 | 5.5% |
| i | 659 | 5.5% |
| s | 626 | 5.2% |
| M | 569 | 4.8% |
| l | 530 | 4.4% |
| o | 491 | 4.1% |
| Other values (50) | 4430 |
| Value | Count | Frequency (%) |
| 1371 | 11.3% | |
| r | 964 | 8.0% |
| a | 839 | 6.9% |
| e | 837 | 6.9% |
| i | 699 | 5.8% |
| s | 646 | 5.3% |
| n | 635 | 5.3% |
| l | 557 | 4.6% |
| M | 556 | 4.6% |
| o | 487 | 4.0% |
| Other values (49) | 4500 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 11952 |
| Value | Count | Frequency (%) |
| (unknown) | 12091 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1357 | 11.4% | |
| r | 957 | 8.0% |
| e | 848 | 7.1% |
| a | 826 | 6.9% |
| n | 659 | 5.5% |
| i | 659 | 5.5% |
| s | 626 | 5.2% |
| M | 569 | 4.8% |
| l | 530 | 4.4% |
| o | 491 | 4.1% |
| Other values (50) | 4430 |
| Value | Count | Frequency (%) |
| 1371 | 11.3% | |
| r | 964 | 8.0% |
| a | 839 | 6.9% |
| e | 837 | 6.9% |
| i | 699 | 5.8% |
| s | 646 | 5.3% |
| n | 635 | 5.3% |
| l | 557 | 4.6% |
| M | 556 | 4.6% |
| o | 487 | 4.0% |
| Other values (49) | 4500 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 11952 |
| Value | Count | Frequency (%) |
| (unknown) | 12091 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1357 | 11.4% | |
| r | 957 | 8.0% |
| e | 848 | 7.1% |
| a | 826 | 6.9% |
| n | 659 | 5.5% |
| i | 659 | 5.5% |
| s | 626 | 5.2% |
| M | 569 | 4.8% |
| l | 530 | 4.4% |
| o | 491 | 4.1% |
| Other values (50) | 4430 |
| Value | Count | Frequency (%) |
| 1371 | 11.3% | |
| r | 964 | 8.0% |
| a | 839 | 6.9% |
| e | 837 | 6.9% |
| i | 699 | 5.8% |
| s | 646 | 5.3% |
| n | 635 | 5.3% |
| l | 557 | 4.6% |
| M | 556 | 4.6% |
| o | 487 | 4.0% |
| Other values (49) | 4500 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 11952 |
| Value | Count | Frequency (%) |
| (unknown) | 12091 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1357 | 11.4% | |
| r | 957 | 8.0% |
| e | 848 | 7.1% |
| a | 826 | 6.9% |
| n | 659 | 5.5% |
| i | 659 | 5.5% |
| s | 626 | 5.2% |
| M | 569 | 4.8% |
| l | 530 | 4.4% |
| o | 491 | 4.1% |
| Other values (50) | 4430 |
| Value | Count | Frequency (%) |
| 1371 | 11.3% | |
| r | 964 | 8.0% |
| a | 839 | 6.9% |
| e | 837 | 6.9% |
| i | 699 | 5.8% |
| s | 646 | 5.3% |
| n | 635 | 5.3% |
| l | 557 | 4.6% |
| M | 556 | 4.6% |
| o | 487 | 4.0% |
| Other values (49) | 4500 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7174888 | 4.6860987 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | male |
| 2nd row | male | male |
| 3rd row | female | female |
| 4th row | male | female |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 286 | |
| female | 160 |
| Value | Count | Frequency (%) |
| male | 293 | |
| female | 153 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 286 | |
| female | 160 |
| Value | Count | Frequency (%) |
| male | 293 | |
| female | 153 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2104 |
| Value | Count | Frequency (%) |
| (unknown) | 2090 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2104 |
| Value | Count | Frequency (%) |
| (unknown) | 2090 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2104 |
| Value | Count | Frequency (%) |
| (unknown) | 2090 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 606 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 160 | 7.6% |
| Value | Count | Frequency (%) |
| e | 599 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 153 | 7.3% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 81 | 79 |
| Distinct (%) | 22.9% | 21.9% |
| Missing | 93 | 85 |
| Missing (%) | 20.9% | 19.1% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 30.459178 | 28.614044 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| Maximum | 80 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| 5-th percentile | 4 | 3 |
| Q1 | 21 | 19 |
| median | 29 | 27 |
| Q3 | 39 | 36 |
| 95-th percentile | 59.4 | 56 |
| Maximum | 80 | 80 |
| Range | 79.58 | 79.58 |
| Interquartile range (IQR) | 18 | 17 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 15.111016 | 14.972679 |
| Coefficient of variation (CV) | 0.49610716 | 0.52326329 |
| Kurtosis | 0.2718943 | 0.46267955 |
| Mean | 30.459178 | 28.614044 |
| Median Absolute Deviation (MAD) | 9 | 8 |
| Skewness | 0.47975377 | 0.4756362 |
| Sum | 10752.09 | 10329.67 |
| Variance | 228.34282 | 224.18112 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 22 | 15 | 3.4% |
| 18 | 14 | 3.1% |
| 28 | 14 | 3.1% |
| 24 | 14 | 3.1% |
| 30 | 13 | 2.9% |
| 19 | 13 | 2.9% |
| 32 | 12 | 2.7% |
| 16 | 11 | 2.5% |
| 36 | 11 | 2.5% |
| 26 | 11 | 2.5% |
| Other values (71) | 225 | |
| (Missing) | 93 |
| Value | Count | Frequency (%) |
| 30 | 16 | 3.6% |
| 19 | 15 | 3.4% |
| 27 | 14 | 3.1% |
| 25 | 13 | 2.9% |
| 26 | 13 | 2.9% |
| 36 | 13 | 2.9% |
| 18 | 12 | 2.7% |
| 28 | 11 | 2.5% |
| 29 | 11 | 2.5% |
| 24 | 11 | 2.5% |
| Other values (69) | 232 | |
| (Missing) | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 1 | 3 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 8 | |
| 3 | 4 | |
| 4 | 6 | |
| 5 | 3 | 0.7% |
| 7 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 8 | |
| 3 | 4 | |
| 4 | 6 | |
| 5 | 3 | 0.7% |
| 7 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 1 | 3 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 4 | |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.4955157 | 0.54932735 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 310 | 301 |
| Zeros (%) | 69.5% | 67.5% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 3 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.0594582 | 1.1202074 |
| Coefficient of variation (CV) | 2.1380922 | 2.0392348 |
| Kurtosis | 19.023477 | 14.967694 |
| Mean | 0.4955157 | 0.54932735 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.7702281 | 3.3831567 |
| Sum | 221 | 245 |
| Variance | 1.1224518 | 1.2548647 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 310 | |
| 1 | 99 | 22.2% |
| 2 | 17 | 3.8% |
| 4 | 7 | 1.6% |
| 3 | 7 | 1.6% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 301 | |
| 1 | 104 | 23.3% |
| 2 | 14 | 3.1% |
| 4 | 11 | 2.5% |
| 3 | 10 | 2.2% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 310 | |
| 1 | 99 | 22.2% |
| 2 | 17 | 3.8% |
| 3 | 7 | 1.6% |
| 4 | 7 | 1.6% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 301 | |
| 1 | 104 | 23.3% |
| 2 | 14 | 3.1% |
| 3 | 10 | 2.2% |
| 4 | 11 | 2.5% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 301 | |
| 1 | 104 | 23.3% |
| 2 | 14 | 3.1% |
| 3 | 10 | 2.2% |
| 4 | 11 | 2.5% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 310 | |
| 1 | 99 | 22.2% |
| 2 | 17 | 3.8% |
| 3 | 7 | 1.6% |
| 4 | 7 | 1.6% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.39237668 | 0.39237668 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 6 |
| Zeros | 340 | 335 |
| Zeros (%) | 76.2% | 75.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 6 |
| Range | 6 | 6 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.85108534 | 0.80494829 |
| Coefficient of variation (CV) | 2.1690518 | 2.0514682 |
| Kurtosis | 10.949782 | 10.338209 |
| Mean | 0.39237668 | 0.39237668 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.9315905 | 2.7294591 |
| Sum | 175 | 175 |
| Variance | 0.72434625 | 0.64794175 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 58 | 13.0% |
| 2 | 39 | 8.7% |
| 5 | 3 | 0.7% |
| 4 | 3 | 0.7% |
| 3 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 335 | |
| 1 | 61 | 13.7% |
| 2 | 44 | 9.9% |
| 3 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 58 | 13.0% |
| 2 | 39 | 8.7% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 335 | |
| 1 | 61 | 13.7% |
| 2 | 44 | 9.9% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 335 | |
| 1 | 61 | 13.7% |
| 2 | 44 | 9.9% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 340 | |
| 1 | 58 | 13.0% |
| 2 | 39 | 8.7% |
| 3 | 2 | 0.4% |
| 4 | 3 | 0.7% |
| 5 | 3 | 0.7% |
| 6 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 383 | 380 |
| Distinct (%) | 85.9% | 85.2% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.7959641 | 6.8318386 |
| Min length | 3 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 334 | 333 ? |
| Unique (%) | 74.9% | 74.7% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 31027 | 239865 |
| 2nd row | 2686 | 28424 |
| 3rd row | PC 17477 | 35851 |
| 4th row | 2624 | 17421 |
| 5th row | 693 | 244361 |
| Value | Count | Frequency (%) |
| pc | 28 | 5.0% |
| c.a | 16 | 2.8% |
| a/5 | 9 | 1.6% |
| ca | 7 | 1.2% |
| w./c | 7 | 1.2% |
| soton/o.q | 7 | 1.2% |
| ston/o | 6 | 1.1% |
| 2 | 6 | 1.1% |
| 1601 | 4 | 0.7% |
| 2144 | 4 | 0.7% |
| Other values (401) | 471 |
| Value | Count | Frequency (%) |
| pc | 27 | 4.7% |
| c.a | 14 | 2.5% |
| a/5 | 8 | 1.4% |
| ca | 8 | 1.4% |
| 2 | 6 | 1.1% |
| ston/o | 6 | 1.1% |
| w./c | 5 | 0.9% |
| sc/paris | 5 | 0.9% |
| 1601 | 5 | 0.9% |
| 347082 | 5 | 0.9% |
| Other values (402) | 481 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 338 | |
| 2 | 296 | |
| 7 | 247 | 8.1% |
| 4 | 240 | 7.9% |
| 6 | 221 | 7.3% |
| 0 | 201 | 6.6% |
| 5 | 195 | 6.4% |
| 9 | 155 | 5.1% |
| 8 | 146 | 4.8% |
| Other values (21) | 620 |
| Value | Count | Frequency (%) |
| 3 | 377 | |
| 1 | 350 | |
| 2 | 306 | |
| 7 | 229 | 7.5% |
| 4 | 225 | 7.4% |
| 6 | 213 | 7.0% |
| 5 | 205 | 6.7% |
| 0 | 194 | 6.4% |
| 9 | 163 | 5.3% |
| 8 | 145 | 4.8% |
| Other values (25) | 640 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3031 |
| Value | Count | Frequency (%) |
| (unknown) | 3047 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 338 | |
| 2 | 296 | |
| 7 | 247 | 8.1% |
| 4 | 240 | 7.9% |
| 6 | 221 | 7.3% |
| 0 | 201 | 6.6% |
| 5 | 195 | 6.4% |
| 9 | 155 | 5.1% |
| 8 | 146 | 4.8% |
| Other values (21) | 620 |
| Value | Count | Frequency (%) |
| 3 | 377 | |
| 1 | 350 | |
| 2 | 306 | |
| 7 | 229 | 7.5% |
| 4 | 225 | 7.4% |
| 6 | 213 | 7.0% |
| 5 | 205 | 6.7% |
| 0 | 194 | 6.4% |
| 9 | 163 | 5.3% |
| 8 | 145 | 4.8% |
| Other values (25) | 640 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3031 |
| Value | Count | Frequency (%) |
| (unknown) | 3047 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 338 | |
| 2 | 296 | |
| 7 | 247 | 8.1% |
| 4 | 240 | 7.9% |
| 6 | 221 | 7.3% |
| 0 | 201 | 6.6% |
| 5 | 195 | 6.4% |
| 9 | 155 | 5.1% |
| 8 | 146 | 4.8% |
| Other values (21) | 620 |
| Value | Count | Frequency (%) |
| 3 | 377 | |
| 1 | 350 | |
| 2 | 306 | |
| 7 | 229 | 7.5% |
| 4 | 225 | 7.4% |
| 6 | 213 | 7.0% |
| 5 | 205 | 6.7% |
| 0 | 194 | 6.4% |
| 9 | 163 | 5.3% |
| 8 | 145 | 4.8% |
| Other values (25) | 640 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3031 |
| Value | Count | Frequency (%) |
| (unknown) | 3047 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 372 | |
| 1 | 338 | |
| 2 | 296 | |
| 7 | 247 | 8.1% |
| 4 | 240 | 7.9% |
| 6 | 221 | 7.3% |
| 0 | 201 | 6.6% |
| 5 | 195 | 6.4% |
| 9 | 155 | 5.1% |
| 8 | 146 | 4.8% |
| Other values (21) | 620 |
| Value | Count | Frequency (%) |
| 3 | 377 | |
| 1 | 350 | |
| 2 | 306 | |
| 7 | 229 | 7.5% |
| 4 | 225 | 7.4% |
| 6 | 213 | 7.0% |
| 5 | 205 | 6.7% |
| 0 | 194 | 6.4% |
| 9 | 163 | 5.3% |
| 8 | 145 | 4.8% |
| Other values (25) | 640 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 179 | 183 |
| Distinct (%) | 40.1% | 41.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 31.32685 | 30.19587 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 6 | 8 |
| Zeros (%) | 1.3% | 1.8% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.0719 | 7.15 |
| Q1 | 7.8958 | 7.925 |
| median | 13.5 | 15.2458 |
| Q3 | 30.92395 | 30.92395 |
| 95-th percentile | 93.5 | 86.5 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 23.02815 | 22.99895 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 50.063661 | 44.575585 |
| Coefficient of variation (CV) | 1.5981071 | 1.4762146 |
| Kurtosis | 40.93267 | 37.472856 |
| Mean | 31.32685 | 30.19587 |
| Median Absolute Deviation (MAD) | 6.2708 | 7.96455 |
| Skewness | 5.3663318 | 4.998816 |
| Sum | 13971.775 | 13467.358 |
| Variance | 2506.3701 | 1986.9827 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 13 | 24 | 5.4% |
| 7.75 | 24 | 5.4% |
| 8.05 | 20 | 4.5% |
| 7.8958 | 17 | 3.8% |
| 10.5 | 13 | 2.9% |
| 7.2292 | 9 | 2.0% |
| 26.55 | 9 | 2.0% |
| 26 | 8 | 1.8% |
| 7.925 | 8 | 1.8% |
| 7.05 | 7 | 1.6% |
| Other values (169) | 307 |
| Value | Count | Frequency (%) |
| 8.05 | 21 | 4.7% |
| 7.8958 | 19 | 4.3% |
| 13 | 17 | 3.8% |
| 26 | 15 | 3.4% |
| 7.75 | 15 | 3.4% |
| 10.5 | 11 | 2.5% |
| 7.225 | 10 | 2.2% |
| 7.775 | 10 | 2.2% |
| 7.925 | 8 | 1.8% |
| 26.55 | 8 | 1.8% |
| Other values (173) | 312 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 7 | |
| 7.0542 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 2 | 0.4% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 7 | |
| 7.0542 | 2 | 0.4% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 88 | 82 |
| Distinct (%) | 89.8% | 86.3% |
| Missing | 348 | 351 |
| Missing (%) | 78.0% | 78.7% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.4591837 | 3.3473684 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 78 | 72 ? |
| Unique (%) | 79.6% | 75.8% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | B35 | C68 |
| 2nd row | C110 | B18 |
| 3rd row | E24 | E24 |
| 4th row | E31 | E67 |
| 5th row | B69 | F4 |
| Value | Count | Frequency (%) |
| c92 | 2 | 1.8% |
| f4 | 2 | 1.8% |
| e101 | 2 | 1.8% |
| e44 | 2 | 1.8% |
| c126 | 2 | 1.8% |
| f33 | 2 | 1.8% |
| c124 | 2 | 1.8% |
| c123 | 2 | 1.8% |
| c78 | 2 | 1.8% |
| e33 | 2 | 1.8% |
| Other values (89) | 90 |
| Value | Count | Frequency (%) |
| d | 3 | 2.8% |
| c22 | 3 | 2.8% |
| c26 | 3 | 2.8% |
| f2 | 3 | 2.8% |
| b77 | 2 | 1.9% |
| b5 | 2 | 1.9% |
| c2 | 2 | 1.9% |
| e8 | 2 | 1.9% |
| b22 | 2 | 1.9% |
| f4 | 2 | 1.9% |
| Other values (83) | 84 |
Most occurring characters
| Value | Count | Frequency (%) |
| C | 35 | |
| 3 | 32 | 9.4% |
| 2 | 31 | 9.1% |
| 1 | 30 | 8.8% |
| 4 | 24 | 7.1% |
| B | 24 | 7.1% |
| 6 | 23 | 6.8% |
| 5 | 18 | 5.3% |
| 7 | 18 | 5.3% |
| E | 17 | 5.0% |
| Other values (9) | 87 |
| Value | Count | Frequency (%) |
| 2 | 40 | |
| B | 30 | 9.4% |
| C | 29 | 9.1% |
| 6 | 24 | 7.5% |
| 8 | 21 | 6.6% |
| 5 | 21 | 6.6% |
| 1 | 20 | 6.3% |
| D | 17 | 5.3% |
| 7 | 17 | 5.3% |
| 4 | 17 | 5.3% |
| Other values (9) | 82 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 339 |
| Value | Count | Frequency (%) |
| (unknown) | 318 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| C | 35 | |
| 3 | 32 | 9.4% |
| 2 | 31 | 9.1% |
| 1 | 30 | 8.8% |
| 4 | 24 | 7.1% |
| B | 24 | 7.1% |
| 6 | 23 | 6.8% |
| 5 | 18 | 5.3% |
| 7 | 18 | 5.3% |
| E | 17 | 5.0% |
| Other values (9) | 87 |
| Value | Count | Frequency (%) |
| 2 | 40 | |
| B | 30 | 9.4% |
| C | 29 | 9.1% |
| 6 | 24 | 7.5% |
| 8 | 21 | 6.6% |
| 5 | 21 | 6.6% |
| 1 | 20 | 6.3% |
| D | 17 | 5.3% |
| 7 | 17 | 5.3% |
| 4 | 17 | 5.3% |
| Other values (9) | 82 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 339 |
| Value | Count | Frequency (%) |
| (unknown) | 318 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| C | 35 | |
| 3 | 32 | 9.4% |
| 2 | 31 | 9.1% |
| 1 | 30 | 8.8% |
| 4 | 24 | 7.1% |
| B | 24 | 7.1% |
| 6 | 23 | 6.8% |
| 5 | 18 | 5.3% |
| 7 | 18 | 5.3% |
| E | 17 | 5.0% |
| Other values (9) | 87 |
| Value | Count | Frequency (%) |
| 2 | 40 | |
| B | 30 | 9.4% |
| C | 29 | 9.1% |
| 6 | 24 | 7.5% |
| 8 | 21 | 6.6% |
| 5 | 21 | 6.6% |
| 1 | 20 | 6.3% |
| D | 17 | 5.3% |
| 7 | 17 | 5.3% |
| 4 | 17 | 5.3% |
| Other values (9) | 82 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 339 |
| Value | Count | Frequency (%) |
| (unknown) | 318 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| C | 35 | |
| 3 | 32 | 9.4% |
| 2 | 31 | 9.1% |
| 1 | 30 | 8.8% |
| 4 | 24 | 7.1% |
| B | 24 | 7.1% |
| 6 | 23 | 6.8% |
| 5 | 18 | 5.3% |
| 7 | 18 | 5.3% |
| E | 17 | 5.0% |
| Other values (9) | 87 |
| Value | Count | Frequency (%) |
| 2 | 40 | |
| B | 30 | 9.4% |
| C | 29 | 9.1% |
| 6 | 24 | 7.5% |
| 8 | 21 | 6.6% |
| 5 | 21 | 6.6% |
| 1 | 20 | 6.3% |
| D | 17 | 5.3% |
| 7 | 17 | 5.3% |
| 4 | 17 | 5.3% |
| Other values (9) | 82 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 1 |
| Missing (%) | 0.2% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | C | S |
| 3rd row | C | Q |
| 4th row | C | C |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 76 | 17.0% |
| Q | 45 | 10.1% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 87 | 19.5% |
| Q | 34 | 7.6% |
| (Missing) | 1 | 0.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 324 | |
| c | 76 | 17.1% |
| q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| s | 324 | |
| c | 87 | 19.6% |
| q | 34 | 7.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 76 | 17.1% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 87 | 19.6% |
| Q | 34 | 7.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 76 | 17.1% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 87 | 19.6% |
| Q | 34 | 7.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 76 | 17.1% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 87 | 19.6% |
| Q | 34 | 7.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 76 | 17.1% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 324 | |
| C | 87 | 19.6% |
| Q | 34 | 7.6% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.174 | -0.242 | 0.032 | 0.242 | 0.149 | -0.168 | 0.096 |
| Embarked | 0.000 | 1.000 | 0.196 | 0.000 | 0.000 | 0.257 | 0.105 | 0.000 | 0.220 |
| Fare | 0.174 | 0.196 | 1.000 | 0.386 | -0.014 | 0.474 | 0.209 | 0.424 | 0.252 |
| Parch | -0.242 | 0.000 | 0.386 | 1.000 | 0.046 | 0.000 | 0.323 | 0.477 | 0.122 |
| PassengerId | 0.032 | 0.000 | -0.014 | 0.046 | 1.000 | 0.011 | 0.083 | -0.069 | 0.152 |
| Pclass | 0.242 | 0.257 | 0.474 | 0.000 | 0.011 | 1.000 | 0.096 | 0.121 | 0.300 |
| Sex | 0.149 | 0.105 | 0.209 | 0.323 | 0.083 | 0.096 | 1.000 | 0.236 | 0.482 |
| SibSp | -0.168 | 0.000 | 0.424 | 0.477 | -0.069 | 0.121 | 0.236 | 1.000 | 0.178 |
| Survived | 0.096 | 0.220 | 0.252 | 0.122 | 0.152 | 0.300 | 0.482 | 0.178 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.114 | 0.093 | -0.323 | -0.003 | 0.253 | 0.089 | -0.255 | 0.227 |
| Embarked | 0.114 | 1.000 | 0.091 | 0.016 | 0.000 | 0.221 | 0.000 | 0.078 | 0.103 |
| Fare | 0.093 | 0.091 | 1.000 | 0.426 | -0.015 | 0.426 | 0.143 | 0.458 | 0.214 |
| Parch | -0.323 | 0.016 | 0.426 | 1.000 | 0.016 | 0.037 | 0.329 | 0.500 | 0.213 |
| PassengerId | -0.003 | 0.000 | -0.015 | 0.016 | 1.000 | 0.000 | 0.000 | -0.037 | 0.144 |
| Pclass | 0.253 | 0.221 | 0.426 | 0.037 | 0.000 | 1.000 | 0.109 | 0.146 | 0.303 |
| Sex | 0.089 | 0.000 | 0.143 | 0.329 | 0.000 | 0.109 | 1.000 | 0.198 | 0.474 |
| SibSp | -0.255 | 0.078 | 0.458 | 0.500 | -0.037 | 0.146 | 0.198 | 1.000 | 0.151 |
| Survived | 0.227 | 0.103 | 0.214 | 0.213 | 0.144 | 0.303 | 0.474 | 0.151 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 476 | 477 | 0 | 2 | Renouf, Mr. Peter Henry | male | 34.0 | 1 | 0 | 31027 | 21.0000 | NaN | S |
| 568 | 569 | 0 | 3 | Doharr, Mr. Tannous | male | NaN | 0 | 0 | 2686 | 7.2292 | NaN | C |
| 641 | 642 | 1 | 1 | Sagesser, Mlle. Emma | female | 24.0 | 0 | 0 | PC 17477 | 69.3000 | B35 | C |
| 522 | 523 | 0 | 3 | Lahoud, Mr. Sarkis | male | NaN | 0 | 0 | 2624 | 7.2250 | NaN | C |
| 545 | 546 | 0 | 1 | Nicholson, Mr. Arthur Ernest | male | 64.0 | 0 | 0 | 693 | 26.0000 | NaN | S |
| 258 | 259 | 1 | 1 | Ward, Miss. Anna | female | 35.0 | 0 | 0 | PC 17755 | 512.3292 | NaN | C |
| 110 | 111 | 0 | 1 | Porter, Mr. Walter Chamberlain | male | 47.0 | 0 | 0 | 110465 | 52.0000 | C110 | S |
| 529 | 530 | 0 | 2 | Hocking, Mr. Richard George | male | 23.0 | 2 | 1 | 29104 | 11.5000 | NaN | S |
| 98 | 99 | 1 | 2 | Doling, Mrs. John T (Ada Julia Bone) | female | 34.0 | 0 | 1 | 231919 | 23.0000 | NaN | S |
| 888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 791 | 792 | 0 | 2 | Gaskell, Mr. Alfred | male | 16.0 | 0 | 0 | 239865 | 26.0000 | NaN | S |
| 191 | 192 | 0 | 2 | Carbines, Mr. William | male | 19.0 | 0 | 0 | 28424 | 13.0000 | NaN | S |
| 156 | 157 | 1 | 3 | Gilnagh, Miss. Katherine "Katie" | female | 16.0 | 0 | 0 | 35851 | 7.7333 | NaN | Q |
| 581 | 582 | 1 | 1 | Thayer, Mrs. John Borland (Marian Longstreth Morris) | female | 39.0 | 1 | 1 | 17421 | 110.8833 | C68 | C |
| 343 | 344 | 0 | 2 | Sedgwick, Mr. Charles Frederick Waddington | male | 25.0 | 0 | 0 | 244361 | 13.0000 | NaN | S |
| 29 | 30 | 0 | 3 | Todoroff, Mr. Lalio | male | NaN | 0 | 0 | 349216 | 7.8958 | NaN | S |
| 886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
| 747 | 748 | 1 | 2 | Sinkkonen, Miss. Anna | female | 30.0 | 0 | 0 | 250648 | 13.0000 | NaN | S |
| 98 | 99 | 1 | 2 | Doling, Mrs. John T (Ada Julia Bone) | female | 34.0 | 0 | 1 | 231919 | 23.0000 | NaN | S |
| 542 | 543 | 0 | 3 | Andersson, Miss. Sigrid Elisabeth | female | 11.0 | 4 | 2 | 347082 | 31.2750 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 619 | 620 | 0 | 2 | Gavey, Mr. Lawrence | male | 26.0 | 0 | 0 | 31028 | 10.5000 | NaN | S |
| 266 | 267 | 0 | 3 | Panula, Mr. Ernesti Arvid | male | 16.0 | 4 | 1 | 3101295 | 39.6875 | NaN | S |
| 615 | 616 | 1 | 2 | Herman, Miss. Alice | female | 24.0 | 1 | 2 | 220845 | 65.0000 | NaN | S |
| 564 | 565 | 0 | 3 | Meanwell, Miss. (Marion Ogden) | female | NaN | 0 | 0 | SOTON/O.Q. 392087 | 8.0500 | NaN | S |
| 288 | 289 | 1 | 2 | Hosono, Mr. Masabumi | male | 42.0 | 0 | 0 | 237798 | 13.0000 | NaN | S |
| 706 | 707 | 1 | 2 | Kelly, Mrs. Florence "Fannie" | female | 45.0 | 0 | 0 | 223596 | 13.5000 | NaN | S |
| 125 | 126 | 1 | 3 | Nicola-Yarred, Master. Elias | male | 12.0 | 1 | 0 | 2651 | 11.2417 | NaN | C |
| 280 | 281 | 0 | 3 | Duane, Mr. Frank | male | 65.0 | 0 | 0 | 336439 | 7.7500 | NaN | Q |
| 517 | 518 | 0 | 3 | Ryan, Mr. Patrick | male | NaN | 0 | 0 | 371110 | 24.1500 | NaN | Q |
| 123 | 124 | 1 | 2 | Webber, Miss. Susan | female | 32.5 | 0 | 0 | 27267 | 13.0000 | E101 | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 854 | 855 | 0 | 2 | Carter, Mrs. Ernest Courtenay (Lilian Hughes) | female | 44.0 | 1 | 0 | 244252 | 26.0000 | NaN | S |
| 506 | 507 | 1 | 2 | Quick, Mrs. Frederick Charles (Jane Richards) | female | 33.0 | 0 | 2 | 26360 | 26.0000 | NaN | S |
| 480 | 481 | 0 | 3 | Goodwin, Master. Harold Victor | male | 9.0 | 5 | 2 | CA 2144 | 46.9000 | NaN | S |
| 328 | 329 | 1 | 3 | Goldsmith, Mrs. Frank John (Emily Alice Brown) | female | 31.0 | 1 | 1 | 363291 | 20.5250 | NaN | S |
| 646 | 647 | 0 | 3 | Cor, Mr. Liudevit | male | 19.0 | 0 | 0 | 349231 | 7.8958 | NaN | S |
| 362 | 363 | 0 | 3 | Barbara, Mrs. (Catherine David) | female | 45.0 | 0 | 1 | 2691 | 14.4542 | NaN | C |
| 427 | 428 | 1 | 2 | Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall") | female | 19.0 | 0 | 0 | 250655 | 26.0000 | NaN | S |
| 730 | 731 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0 | 0 | 0 | 24160 | 211.3375 | B5 | S |
| 352 | 353 | 0 | 3 | Elias, Mr. Tannous | male | 15.0 | 1 | 1 | 2695 | 7.2292 | NaN | C |
| 498 | 499 | 0 | 1 | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) | female | 25.0 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||